Keyword Extraction using Multiple Novel Features
نویسندگان
چکیده
In this paper, we propose a novel approach for keyword extraction. Different from previous keyword extraction methods, which identify keywords based on the document alone, this approach introduces Wikipedia knowledge and document genre to extract keywords from the document. Keyword extraction is accomplished by a classification model utilizing not only traditional word based features but also features based on Wikipedia knowledge and document genre. In our experiment, this novel keyword extraction approach outperforms previous models for keyword extraction in terms of precision-recall metric and breaks through the plateau previously reached in the field.
منابع مشابه
Keyword Extraction and Headline Generation Using Novel Word Features
We introduce several novel word features for keyword extraction and headline generation. These new word features are derived according to the background knowledge of a document as supplied by Wikipedia. Given a document, to acquire its background knowledge from Wikipedia, we first generate a query for searching the Wikipedia corpus based on the key facts present in the document. We then use the...
متن کاملIntegrating Semantic Relatedness and Words' Intrinsic Features for Keyword Extraction
Keyword extraction attracts much attention for its significant role in various natural language processing tasks. While some existing methods for keyword extraction have considered using single type of semantic relatedness between words or inherent attributes of words, almost all of them ignore two important issues: 1) how to fuse multiple types of semantic relations between words into a unifor...
متن کاملAutomatic Keyword Extraction from Documents Using Conditional Random Fields
Keywords are subset of words or phrases from a document that can describe the meaning of the document. Many text mining applications can take advantage from it. Unfortunately, a large portion of documents still do not have keywords assigned. On the other hand, manual assignment of high quality keywords is expensive, time-consuming, and error prone. Therefore, most algorithms and systems aimed t...
متن کاملA Knowledge-Base Oriented Approach for Automatic Keyword Extraction
Automatic keyword extraction is an important subfield of information extraction process. It is a difficult task, where numerous different techniques and resources have been proposed. In this paper, we propose a generic approach to extract keyword from documents using encyclopedic knowledge. Our two-step approach first relies on a classification step for identifying candidate keywords followed b...
متن کاملA Fuzzy Logic Based Improved Keyword Extraction From Meeting Transcripts
Keyword Extraction is the process of assigning keywords to a document where the important words are selected by the system automatically. This proposed frame work is used to extract the keywords using Fuzzy logic method from Meeting Transcripts. At first, the given input is preprocessed. Subsequently, the preprocessed data will be sent to the features extraction method. In this method three fea...
متن کامل